76 research outputs found
Non-native children speech recognition through transfer learning
This work deals with non-native children's speech and investigates both
multi-task and transfer learning approaches to adapt a multi-language Deep
Neural Network (DNN) to speakers, specifically children, learning a foreign
language. The application scenario is characterized by young students learning
English and German and reading sentences in these second-languages, as well as
in their mother language. The paper analyzes and discusses techniques for
training effective DNN-based acoustic models starting from children native
speech and performing adaptation with limited non-native audio material. A
multi-lingual model is adopted as baseline, where a common phonetic lexicon,
defined in terms of the units of the International Phonetic Alphabet (IPA), is
shared across the three languages at hand (Italian, German and English); DNN
adaptation methods based on transfer learning are evaluated on significant
non-native evaluation sets. Results show that the resulting non-native models
allow a significant improvement with respect to a mono-lingual system adapted
to speakers of the target language
Automatic Quality Estimation for ASR System Combination
Recognizer Output Voting Error Reduction (ROVER) has been widely used for
system combination in automatic speech recognition (ASR). In order to select
the most appropriate words to insert at each position in the output
transcriptions, some ROVER extensions rely on critical information such as
confidence scores and other ASR decoder features. This information, which is
not always available, highly depends on the decoding process and sometimes
tends to over estimate the real quality of the recognized words. In this paper
we propose a novel variant of ROVER that takes advantage of ASR quality
estimation (QE) for ranking the transcriptions at "segment level" instead of:
i) relying on confidence scores, or ii) feeding ROVER with randomly ordered
hypotheses. We first introduce an effective set of features to compensate for
the absence of ASR decoder information. Then, we apply QE techniques to perform
accurate hypothesis ranking at segment-level before starting the fusion
process. The evaluation is carried out on two different tasks, in which we
respectively combine hypotheses coming from independent ASR systems and
multi-microphone recordings. In both tasks, it is assumed that the ASR decoder
information is not available. The proposed approach significantly outperforms
standard ROVER and it is competitive with two strong oracles that e xploit
prior knowledge about the real quality of the hypotheses to be combined.
Compared to standard ROVER, the abs olute WER improvements in the two
evaluation scenarios range from 0.5% to 7.3%
DNN adaptation by automatic quality estimation of ASR hypotheses
In this paper we propose to exploit the automatic Quality Estimation (QE) of
ASR hypotheses to perform the unsupervised adaptation of a deep neural network
modeling acoustic probabilities. Our hypothesis is that significant
improvements can be achieved by: i)automatically transcribing the evaluation
data we are currently trying to recognise, and ii) selecting from it a subset
of "good quality" instances based on the word error rate (WER) scores predicted
by a QE component. To validate this hypothesis, we run several experiments on
the evaluation data sets released for the CHiME-3 challenge. First, we operate
in oracle conditions in which manual transcriptions of the evaluation data are
available, thus allowing us to compute the "true" sentence WER. In this
scenario, we perform the adaptation with variable amounts of data, which are
characterised by different levels of quality. Then, we move to realistic
conditions in which the manual transcriptions of the evaluation data are not
available. In this case, the adaptation is performed on data selected according
to the WER scores "predicted" by a QE component. Our results indicate that: i)
QE predictions allow us to closely approximate the adaptation results obtained
in oracle conditions, and ii) the overall ASR performance based on the proposed
QE-driven adaptation method is significantly better than the strong, most
recent, CHiME-3 baseline.Comment: Computer Speech & Language December 201
Fed-EE: Federating Heterogeneous ASR Models using Early-Exit Architectures
Automatic speech recognition models require large speech recordings for training. However, the collection of such data often is cumbersome and leads to privacy concerns. Federated learning has been widely used as an effective decentralized technique that collaboratively learns a shared model while keeping the data local on clients devices. Unfortunately, client devices often feature limited computation and communication resources leading to practical difficulties for large models. In addition, the heterogeneity that characterizes edge devices make unpractical federating a single model that fits all the different clients. Differently from the recent literature, where multiple different architectures are used, in this work we
10 propose using early-exiting. This brings 2 benefits: a single model is used on a variety of devices; federating the models is straightforward. Experiments on the public dataset TED-LIUM 3 show that our proposed approach is effective and can be combined with basic federated learning strategies. We also shed light on how to federate self-attention models for speech recognition, for which an established recipe does not exist in literature
Microleakage of Direct Restorations. Comparisonbetween Bulk-Fill and Traditional Composite Resins:Systematic Review and Meta-Analysis
Since the bulk-fill composites were produced, there was a progressive diffusion of their use for direct conservative treatment in posterior teeth. Their chemical structure increases the depth of cure and decreases the polymerization contraction; in this man- ner, bulk-fill composites can be placed in 4 mm single layers and the treatment times are considerably reduced. However, aesthetic and mechanical properties and impact on microleakage of bulk-fill resins are still unclear. This systematic review and meta-analysis aimed to assess the risk of microleakage of direct posterior restorations made of bulk-fill versus conventional composite resins. Researches were performed on PubMed and Scopus databases. Eligible in vivo studies, published since 2006, were reviewed. Outcomes of marginal discoloration, marginal adaptation, and recurrent caries were considered to conduct the systematic review and meta-analysis. Secondary data were examined to implement additional analysis and assess the risk of bias. Eight randomized clinical trials were analyzed, involving 778 direct restorations. The summary of RCTs led to significant but inconsistent results; the marginal discolor- ation and recurrent caries were found to be improved respectively by 5.1 and 1.4%, whereas the marginal adaptation was reduced of 6.5%. Secondary analyses revealed that follow-up periods, the adhesive system used and the class of carious lesions eval- uated are confounding factors, and they result in a risk of bias across studies. Bulk-fill composites are innovative materials for conservative dentistry and they can be used to reduce treatment steps and duration of operative times. There are insufficient data to explore the relationship between bulk-fill composites and microleakage and further investigations are needed
Automatic assessment of spoken language proficiency of non-native children
This paper describes technology developed to automatically grade Italian
students (ages 9-16) on their English and German spoken language proficiency.
The students' spoken answers are first transcribed by an automatic speech
recognition (ASR) system and then scored using a feedforward neural network
(NN) that processes features extracted from the automatic transcriptions.
In-domain acoustic models, employing deep neural networks (DNNs), are derived
by adapting the parameters of an original out of domain DNN
Driving ROVER with Segment-based ASR Quality Estimation
ROVER is a widely used method to
combine the output of multiple automatic
speech recognition (ASR) systems.
Though effective, the basic approach and
its variants suffer from potential drawbacks:
i) their results depend on the order
in which the hypotheses are used to feed
the combination process, ii) when applied
to combine long hypotheses, they disregard
possible differences in transcription
quality at local level, iii) they often rely on
word confidence information. We address
these issues by proposing a segment-based
ROVER in which hypothesis ranking is
obtained from a confidence-independent
ASR quality estimation method. Our results
on English data from the IWSLT2012
and IWSLT2013 evaluation campaigns
significantly outperform standard ROVER
and approximate two strong oracles
- …